Re-Identification of Patient Subgroups in Uveal Melanoma

Uveal melanoma (UM) is a comparatively rare cancer but requires serious consideration since patients with developing metastatic UM survive only for about 6–12 months. Fortunately, increasingly large multi-omics databases allow us to further understand cancer initiation and development. Moreover, previous studies have observed that associations between copy number aberrations (CNA) or methylation (MET) versus messenger RNA (mRNA) expression have affected these processes. From that, we decide to explore the effect of these associations on a case study of UM. Also, the current subtypes of UM display its weak association with biological phenotypes and its lack of therapy suggestions. Therefore, the re-identification of molecular subtypes is a pressing need. In this study, we recruit three omics profiles, including CNA, MET, and mRNA, in a UM cohort from The Cancer Genome Atlas (TCGA). Firstly, we identify two sets of genes, CNAexp and METexp, whose CNA and MET significantly correlated with their corresponding mRNA, respectively. Then, single and integrative analyses of the three data types are performed using the PINSPlus tool. As a result, we discover two novel integrative subgroups, IntSub1 and IntSub2, which could be a useful alternative classification for UM patients in the future. To further explore molecular events behind each subgroup, we identify their subgroup-specific genes computationally. Accordingly, the highest expressed genes among IntSub1-specific genes are mostly enriched with immune-related processes. On the other hand, IntSub2-specific genes are highly associated with cellular cation homeostasis, which responds effectively to chemotherapy using ion channel inhibitor drugs. In addition, we detect that the two integrative subgroups show different age-related risks and survival rates. These discoveries can influence the frequency of metastatic surveillance and support medical practitioners to choose an appropriate treatment regime.


INTRODUCTION
Uveal melanoma (UM) is a comparatively rare cancer formed from melanocytes within the uveal tract of the eye involving either in the iris, ciliary body, or mostly choroid (1) and responsible for about five cases per million per year (2). Although current first-line treatment approaches receive good results for this malignancy, specifically, UM patients can live longer, but we want to improve early diagnosis more with the hope of raising overall patient survival as smaller tumors are treated, resulting in achieving local disease control and vision preservation with the possibility to prevent metastases (3). However, it has still remained challenging. Indeed, UM patients with the metastatic disease only lived for approximately 6-12 months (4). This emphasizes a pressing need of improving the diagnosis, prevention, and treatment of UM patients.
Besides, several recent large-scale and multi-omics databases have enabled us to see associations between the genetic or epigenetic alterations versus the tumorigenesis and progression of UM. For example, the importance of different types of RNA such as mRNA, microRNA (miRNA), and long non-coding RNA (lnCRNA) was investigated in UM (5,6). Based on an in silico and experimental biology, lnCRNA LINC00518 was identified to be a oncogene in UM and could be used in RNA-based therapeutic approaches as a promising target (6). Additionally, UM has frequently had copy number aberrations (CNA) gain regions of chromosomes 6p and 8q as well as loss regions of chromosomes 1p, 3, 6q, 8p, and 16q (7,8). Particularly, BAP1 mutations related to chromosome 3 monosomy and SF3B1 and SRSF2 alterations related to chromosome 3 disomy contributed to high risk of metastasis. Meanwhile, mutations on EIF1AX related to chromosome 3 disomy were associated with low metastatic risk (9). In addition, Yang et al. (4) have made a comprehensive review of the role of DNA methylation in the development and metastasis of UM. They highlighted that several tumor suppressor genes comprising RASSF1A and p16INK4a have been altered by DNA methylation (MET) and contributed to controlling cell migration and invasion in UM. Moreover, p16INK4a expression was reported in all UM liver metastatic cases and may have potential in discriminating UM and cutaneous melanoma (10). Besides, the autophagy has been hypothesized to have a role in inhibiting tumor growth when investigating this process-related protein, Beclin-1. The high level of immunohistochemistry in Beclin-1 was found to be a positive prognosis of UM patients (11).
Moreover, multiple prior studies have been conducted to stratify UM patients using various kinds of -omic data. Among them, the most popular work proposed by Robertson et al. (5) has conducted a multiplatform analysis of 80 UM patients using only one single data type of omics data, including mRNA expression, miRNA, long non-coding RNA, MET, and CNA, and successfully identified four different subtypes: two associated with poor-prognosis monosomy 3 (M3) and the others with better-prognosis disomy 3 (D3). However, we claim that not a single data alone but instead integrated omics data are powerful enough to explain the interplay of molecules and the biological phenotypes of cancer holistically (12)(13)(14). This motivates us to do this study in order to discover novel subgroups of UM patients that adopt an integrative approach.
In this study, we aimed to analyze three omics profiles, namely, CNA, MET, and mRNA, in a UM cohort from The Cancer Genome Atlas (TCGA). To this purpose, we identified the significant correlation between CNA and MET versus their own corresponding expression levels ( Figure 1). It was of importance to note that the omics experiments were conducted with thousands of simultaneous hypothesis tests (15). Therefore, the adjusted P-value using the Benjamini-Hochberg procedure (16) as a measure of significant tests controlling the number of false discoveries was necessarily considered in this work. Then, single and joint analyses of the three data types were performed using the tool PINSPlus (17,18). As a result, we discovered two novel integrative subgroups, IntSub1 and IntSub2, which could be potentially a future classification system for UM patients. These discoveries could influence the frequency of metastatic surveillance and support medical practitioners to choose an appropriate treatment regime.

Materials
The three datasets, namely, CNA, MET, and mRNA expression, were collected from the TCGA project (TCGA, Firehose Legacy) (5) and downloaded from the cBioPortal website (19,20). The UM cohort is described in Table 1.

Data Preprocessing
There were two preprocessing steps applied to the three profiles (i.e., mRNA, CNA, and MET) from the data. We first checked if the 80 patients from each of the three profiles and clinical data were matched. Then, we detected genes whose missing values were more than 50% using the k-nearest neighbor algorithm (21) from the CancerSubtypes package (version 1.14.0) (22).

Identification and Examination of the Relationship of CNAexp and METexp Genes
Here, we kept only genes shared between CNA and mRNA, as well as between MET and mRNA. To identify and examine the relationship of CNAexp and METexp genes, we used the R tool geneCor (14). Roughly, the tool first computed the correlation coefficients (r) between MET and mRNA, as well as between CNA and mRNA based on Spearman's rank method, and then, the conversion of significant r (i.e., adjusted P-value ≤ 0.05; Benjamini-Hochberg (16); two-sided) into Z values by Fisher's Z-transformation following the equation: Z = 0.5 ln[(1 + r)/(1 − r]). Secondly, the overall distributions of calculating Z values were pictured automatically. Thirdly, geneCor computed the skewness of the Z-score distributions using the D'Agostino test. The overall skewness illustrated whether CNA or MET was correlated positively or negatively with their own corresponding mRNA. Parallelly, geneCor also issued two sets of genes, CNAexp and METexp, whose CNA and MET significantly correlated with their corresponding mRNA expression levels, respectively. Further analysis was performed using FSbyCOX in the package CancerSubtypes (version 1.14.0) (22) to only retain a small number of genes associated significantly with a prognostic value (P-value ≤ 0.0005; log-rank test; two-tailed) in the two gene sets (i.e., CNAexp and METexp).

Single and Integrated Subtyping
The related study proposed by Robertson et al. (5) found the four different molecular groups based on highly expressed genes, CNA and MET, separately. We hypothesized that an integrative clustering analysis, comprising the three profiles above, would be a more powerful approach. Moreover, our clustering tool, PINSPlus (version 2.0.5) (17,18), demonstrated its great ability in cancer subtyping, in general, using multi-omics data. Especially, it classified breast cancer patients into two subgroups that have possessed biologically and clinically meaningful properties (14). We, therefore, continued applying this tool to seeking the optimal group number of UV patients. In this study, we kept all the parameters of PINSPlus as default (i.e., clustering method was k-means); except for the number of candidate groups, k was set to a range from 2 to 10. The area under the receiver operating characteristic (AUC) value allowed us to choose the optimal k.

Subgroup-Specific Gene Determination and Enrichment Analysis
To observe the biological differences between identified UM subgroups, we sought to discover the subtype-specific genes using the package GeneCluster (version 0.1.0) (14). Given the lists of genes (i.e., METexp and CNAexp), this tool computed the mean expression level of each gene in each identified patient subgroup across all samples. Then, the gene whose mean expression value was the highest will be allotted to a cluster if the P-value ≤0.05 (one-way ANOVA test; two-sided). Finally, the gene will be recognized officially as belonging to that subtype if  Overall survival (OS) status was defined as vital status (dead or alive), whereas OS time was identified as the time to UM death or last follow-up (unit: day). The follow-up time OS was truncated to 2,600 days.
FIGURE 1 | Analysis pipeline. Firstly, we inputted CNA and MET datasets with their corresponding mRNA data to the function geneCor to identify a list of CNAexp and METexp genes, respectively. Then, PINPlus was used to extract different patient subgroups for individual CNAexp and METexp datasets and integration of CNAexp + METexp + mRNA data through single and integrated analyses, respectively. Finally, we discovered subtype-specific genes within each identified integrated subgroup, IntSub1 and IntSub2, using the R package GeneCluster. UM, uveal melanoma.
the adjusted P-value ≤0.05 (Benjamini-Hochberg procedure (16); two-tailed). Subsequently, in order to investigate further the biological themes from the gained subgroup-specific genes, we implemented the enrichment analysis using the DAVID tool (version 6.8) (23,24). Also, the output was concentrated into functional-related gene groups or different meaningful terms that were convenient to translate into the clinic. The significance levels of these terms were assessed based on P-value (Fisher's exact test). In other words, a list of genes with a smaller P-value was more overrepresented and had a stronger association to the subtype phenotypes.

Identification and Examination of the Relationship of CNAexp and METexp Genes
Our tool geneCor provided us with the two sets comprising 4,139 CNAexp genes and 8,157 METexp genes (see Supplementary  Table S1). As pictured in Figure 2A, the CNAexp genes were significantly skewed to the right (skewness = 1.3511, P-value < 2.2 × 10 −16 ; D'Agostino test; two-sided) consistent with the results reported in (25), while the METexp genes were significantly skewed to the left (skewness = −0.3419, P-value < 2.2 × 10 −16 ; D'Agostino test; two-sided) consistent with the results reported in (26). This indicated that there was a consistently converse relation of mRNA with CNA and MET genes. As mentioned, we truncated genes per the gene set above (i.e., CNAexp and METexp) based on the association with the OS of patients. Particularly, due to an overwhelming number of genes in each set, we only preserved genes per set if P-value <0.0005. Finally, 179 CNAexp genes and 859 METexp genes were obtained. It was a weak intersection (50 genes) between CNAexp and METexp, indicating that the CNAexp and METexp were two poorly non-disjoint events ( Figure 2B). Figure 2C shows the frequency of the CNAexp or METexp genes against the total count of genes in each chromosome arm. Of particular interest, CNAexp only distributed in two chromosomes 3 and 8, especially almost in chromosome 8, implying not only a poor prognosis but also a considerably reduced survival (27)(28)(29)(30). Also, we could observe that the METexp genes displayed a regional genomic preference for MET, particularly on chromosome 3, involving in high metastatic risk (26).

Single and Integrated Subtyping
As described in the Materials and Methods section, we implemented the single clustering analyses for CNAexp and METexp, separately. For METexp, the k of two with the AUC of 1.0000 was optimal ( Figure 3A). Similarly, for CNAexp, the same k and AUC were also optimal again ( Figure 3A). Notably, the number of patients assigned to either of the two CNAexp subgroups significantly overlapped with that of the two METexp subgroups (P-value = 3.6714 × 10 −15 ; c 2 test; two-sided; Figure 3B). The heatmap shows the expression patterns of CNAexp subgroups and METexp subgroups from integrated analysis by PINSPlus (Supplementary Figure S1). Moreover, the association between our integrated subgroups, IntSub1 and IntSub2, versus patient subtypes in (5) using mRNA data is also shown in Supplementary Figure S2. Interestingly, IntSub1 was divided almost into subgroups 1 to 3, whereas most patients in IntSub2 belonged previously to subtype 4. We then employed the survival analysis for the acquired subgroups of CNAexp and METexp. The two CNAexp subgroups were revealed to be statistically meaningful to the OS (P-value = 1.7844 × 10 −5 ; two-sided; Figure 3C). Also, with the Cox P-value = 1.1006 × 10 −6 , the two METexp subgroups were significantly correlated with the OS ( Figure 3C). These results told us that the data single clustering strategy seemed to be effective  in this case. However, the given single analyses might only show the results that reflected the solitary aberration in UM pathology. Next, the integrative clustering analysis was leveraged for a combination of CNAexp, METexp, and mRNA gene sets in a similar manner with the single clustering analysis above. Interestingly, PINSPlus classified UM patients into two integrative subgroups called IntSub1 (n = 60) and IntSub2 (n = 20) ( Figure 3D). Specially, they were consistent significantly with the single subgroups of the CNAexp dataset (P-value = 1.0266 × 10 −5 ; c 2 test; two-sided; Figure 3E) and the METexp dataset (P-value = 9.3057 × 10 −7 ; c 2 test; Figure 3E). On top of that, we then investigated the survival analysis which revealed that the two integrated subgroups possessed statistically different factors for the survival of UM patients (P-value = 4.0228 × 10 −5 ; log-rank test; Figure 3F). Also, in Figure 3F, the patients in IntSub2 were significantly worse than those in IntSub1 (hazard ratio of 6.1204 and 95% confidence interval between 2.5970 and 14.4200; IntSub1 was reference; log-rank test). Also, we reviewed the statistical descriptions for UM patients, containing age, gender, tumor stages, and metastasis status, between the IntSub1 and IntSub2 provided in Supplementary Table S2. These results bolstered our confidence in the effectiveness of our previous strategy (14) in discovering the novel UM patient subgroups under the perspective of integration.

Determination of Subgroup-Specific Genes
As mentioned earlier, the GeneCluster tool was leveraged to exploit subtype-specific gene lists. Accordingly, we extracted three subgroup-specific gene lists for the two integrative subgroups using three kinds of profiles: mRNA, CNAexp, and METexp. Specifically, these lists were established on average mRNA expression levels (IntSub1: 347 genes and IntSub2: 431  genes; Supplementary Table S3 Table S6). This indicated that the UM pathology had a strong connection to the abnormally expressed genes related to immune cells. Interestingly, we found that that the highest expressed gene based on copy number aberrations, SLCO5A1, was identified to associate with poor outcome (32), which could be a prospective interpretation for the worse prognosis of IntSub2 patients compared with those in IntSub1. Notably, SLCO5A1 was considered as a prognosis gene correlated with the immune infiltrates. The immune cell infiltration level was noted to be a crucial factor in predicting the UM prognosis (33). Supplementally, we sought out that BAP1 was associated with abnormal DNA methylation within IntSub2 samples rather than other subtypes. It was reported that about 22% of familial UM  cases found the muted BAP1. BAP1 mutations raised not only a large tumor diameter percentage but also the metastasis risk in UM patients. This indicated that BAP1 testing is a reasonable recommendation for hereditary melanoma (34). Additionally, PTP4A3, the most overexpressed gene ranked by mean expression value among specific genes of IntSub2, was defined as a marker of poor prognosis involved in cell migration and metastatic progression (35). Furthermore, metastasis is a confident signal of the poor outcome, resulting in death in most UM cases (36).

Enrichment Analysis Using the DAVID Tool
We next performed the enrichment analysis as described above with the given subgroup-specific genes. Remarkably, the top biological processes for IntSub1-specific CNAexp genes included endonuclease activity and interleukin-17 receptor activity and transcription factor binding (Supplementary Table S7 and Supplementary Figure S3); IntSub1-specific METexp genes were associated with the positive regulation of cell migration, immune effector process, and positive regulation of hydrolase activity (Supplementary Table S8 and Supplementary Figure  S4). Conversely, the IntSub2 was characterized most in cellular cation homeostasis embracing, especially, the regulation of pH and the regulation of calcium ion in the CNAexp profile (Supplementary Table S7 and Supplementary Figure S5). Also, the IntSub2 was distinguished by common abnormalities of METexp genes related to the regulation of gene expression and cellular macromolecule biosynthetic process (Supplementary Table S8 and Supplementary Figure S6).
In this study, we also compared the subgroup-specific genes from the two lists: mRNA (Supplementary Table S3) and CNAexp (Supplementary Table S4) with the FoundationOne CDx (updated on June 15, 2020) that included 321 genes relating closely to cancer and participating in the process of tumorigenesis. Consequently, we revealed 22 subgroup-specific mRNA expression genes (bold red gene names in Supplementary Table S3) and eight subgroupspecific CNAexp genes (bold red gene names in Supplementary  Table S4) included in the database above. Collectively, our results reinforced the clinical association between the obtained subgroupspecific genes and melanoma formation.

Prognostic Factor Identification
We then sought to conduct the age at diagnosis and survival time analyses in order to define the prognosis factor of two UM subtypes. The results are shown in Table 2. It is worth mentioning that 60-year-old or older patients were highly risky to have UM. In addition, there was a distinct difference in the average survival day between IntSub1 and IntSub2 patients: 885.2667 and 617.0000 days, respectively. This indicated that the OS of UM patients could be foreknown dependent partly on which subgroup a patient is assigned to, to some extent. Besides, the patients in the IntSub1 were characterized by the average age of 60.3333 as well as the average OS of 885.2667 days, whereas those numbers in the IntSub2 were 65.6000 years old and 617.0000 days. Obviously, although the average age of the patients in the IntSub1 was only 5 years younger than that of their counterparts in IntSub2, they could live about 9 months longer than the patients in IntSub2. These results should be understood that age-related risks and survival rates might be separate in these integrative subgroups. For a better understanding, we took into account the risk of the two age groups in each subgroup comprising the mid-adults (21-65 years) and the older adults (>65 years) from the 80 UM patients (22-86 years old) in the clinical data. The reason we chose the threshold of 65 years old was because Figure 4A illustrates a bimodal age distribution, implying that we had two groups naturally.
The two age groups, the non-old group and the old group, in each subgroup were interrogated by the survival analyses.
Observing the results reported in Figure 4B, we revealed a significant survival difference between the two age groups in the IntSub2, whereas no statistical significance in patient outcome between the two age groups was seen in the IntSub1, indicating that age factor could be a risk factor to predict the survival time.

DISCUSSION AND CONCLUSION
Recently, genomic profiling at multiple levels (e.g., genomics, epigenomics, transcriptomics) has been boomed (37). The abundant omics type of data has been easily accessed from public databases like TCGA facilitating a better understanding of molecular events behind cancer progression. Additionally, based on the associations between the three types of omics data (mRNA, MET, and CNA), we successfully classified breast cancer into two patient subsets which improved the weak manifestations of the intrinsic subtypes, especially in association with the biological phenotype in a prior work. With these concerns in mind, we have decided to apply this successful framework to a rare cancer like UM.
Here, we defined the two lists of CNAexp and METexp based on the correlations of CNA and MET with their mRNA at first. The resulting lists are leveraged to stratify not only individually but also integratively the 80 UM patients using the PINSPlus tool. We revealed the two molecular subgroups (IntSub1 and IntSub2) along with their subtype-specific genes that help to uncover significantly different clinical characteristics as well as patient outcomes. Importantly, there existed several poorly prognostic genes (SLCO5A1, BAP1, and PTP4A3) which could lead to shorter OS of IntSub2 patients. We next recruited the DAVID tool to perform the enrichment analysis in each integrated clustering. Notably, the IntSub1 showed the overexpression of genes enriched significantly in the immune system process (Supplementary Table S4). Besides, the IntSub2 displayed CNAexp genes known to be key factors in cellular cation homeostasis and regulation of calcium. These findings are likely to help oncologists and physicists find out distinct treatment strategies for the two subgroups. The finding of these subgroups could be a suggestion in clinical application for UM treatment. For example, in the IntSub1, the IL-17 (IL17RE, IL17RD, and IL17RC) played vital roles in immune responses which stimulated the tumor growth and repressed the antitumor activity (38). Fabre et al. (39) affirmed in their study that the IL-17/IL-17R axis could be a novel immunotherapeutic target relevant to the antitumor purpose. Besides, the dense appearance of mutated genes is enriched in the cellular cation homeostasis group (i.e., K + , Ca 2+ , Na + , and H + ). Cell proliferation and apoptosis were regulated by various cation channels. For instance, K + channels participated in the stimulation of the cell end, thus declining the cell number. Therefore, the changeable potassium channels contributed to the malignant expression of cancer (40). In the cellular cation homeostasis gene group, SGK3 played an activation role of potassium channels (41). Moreover, several prior studies showed the promising therapy of K + channel blocking in cancer treatment. This enhanced the consideration of using drugs inhibiting the potassium channels as chemotherapy for UM patients. As an example, astemizole was repositioned in its use by blocking the EAG1 channel which was one of the major potassium channels and brought remarkable efficacy for cancer cell growth (42). Alternatively, the small molecule which was able to block, inhibit, or regulate the calcium ion transport was reported to be a potential anticancer drug, such as brilliant blue G, oxidized ATP for melanoma cases (43). Taken together, targeted therapies may be efficient for the IntSub1 subgroup, while the combination of the cation channel blocker and chemotherapeutic drugs has the potential for IntSub2 patients.
In addition, we saw that the baselines of both IntSub1 and IntSub2 subgroups varied depending potentially on several clinical features being vital factors for prognosis. Thus, the survival comparison between the two subgroups was further interrogated by utilizing a multivariate Cox regression model in terms of age groups, tumor stages, gender, and histology cell type comparisons. The analysis results are shown in Supplementary Table S8. As a consequence, old age groups, tumor stage IV, and histology cell type comparison between spindle cell and predominant mixed spindle cell were considered as significantly independent prognostic factors. Furthermore, some powerful predictive genes (except BAP1) for prognosis used in clinical routine in UM are not identified by our strategy. This can be regarded as a potential restriction of our work when deliberately leveraging the power of integration of multiomics data. The following are several factors giving rise to the poor performance of our strategy. The first factor can be the "curse of dimensionality" being a typical problem when using multimodal data. Another factor can be possibly due to the different nature of data types. Most of the statistical tools only work well on continuous data, whereas the minority of them do well on discrete data. In this study, we have combined the two types.
In conclusion, multi-omics data integration contributes to dealing with the bottleneck in getting insights into complex multi-mechanism diseases like cancer in general and UM in particular. We determined the two clinically and molecularly distinct integrative subgroups, IntSub1 and IntSub2, which not only can be a potential alternative classification system in the future but also give more effective suggestions for UM treatment.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
THYN drafted the manuscript, which was edited by all coauthors. Q-HN conceived and designed the approach, coded the geneCor and GeneCluster tools, and wrote the R codes for analyses. THYN ran the codes. THYN and Q-HN analyzed